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ABSTRACT 


The relationship between the structure of a neural network and its ability to 
perform nonlinear mapping is analyzed. A new algorithm, called the conjugate gradient 
optimization method. for calculating the weights and thresholds of a neural network is 
presented. The performance of the conjugate gradient algorithm is then compared to the 
well known backpropagation method and shown to be more computationally efficient. 
A neural network using the conjugate gradient algorithm is then applied to three simple 
examples to demonstrate its signal processing capabilities. The first example illustrates 
the ability of the neural network to perform classification. The second compares the 
performance of a one-step linear predictor to a neural network for a nonlinear chaotic 
time series. Ihe neural network predictor is shown to provide much greater accuracy 
than its linear counterpart. The final application presented demonstrates the ability of 
a neural network to perform channel equalization for a nonminimun pliase channel. Its 


performance is then compared to its linear equivalent. 
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t. INTRODUCTION 


Artificial neural networks have been studied for many years in the hope of 
achieving human-like performance. Neural networks consist of highly connected sets 
of relatively simple processing elements. Computations are performed collectively by 
the entire network with the activity distributed over all the processing elements. This 
parallel distributed processing provides neural networks with the potential to solve 
complex problems more quickly than the currently well known present serial methods. 
The nonlinear nature and simple structure of neural networks provide a formalism 
for the study of nonlinear signal processing. 

The application of neural networks to signal processing involves developing an 
understanding of the relationship between the structure of a neural network and its 
ability to perform the desired input-to-output mapping. A neural network's structure 
is defined by the number and type of processing elements in the network, the values 
of the weights that connect the processing elements together. and a threshold value 
associated’ with each processing element. Past work has lead to a large variety of 
neural network models. The models include the Hopfield network. ihe single- and 
multi-layer perceptron networks. the reduced Coulomb energy (RCE) classifier, and 
the adaptive resonance theory (ART) model (Ref. 1:pp. 65 73]. Each model differs 
in its structure and the manner in which the weights and thresholds of the network 
are derived. One current method for calculating the weights and thresholds of a 
feedforward multilayer neural network. called the backpropagation method. uses a 
steepest descent method to iteratively adapt the weights and thresholds of the network 
Ref. 2:p. 121]. This method has generally been shown to be slow to converge to the 


optimal set of weights and thresholds for a given problem [Ref. 1:p. 300]. The 


objectives of this thesis research were therefore: 


e Investigate the relationship between the structure of a neural network and its 
ability to perform input-output mapping. 


e Develop an alternative to the backpropagation method that converges more 
quickly to the optimal set of weights and thresholds for any given problem. 


e Compare the performance of a neural network to its linear counterpart for some 
representative signal processing applications. 


Chapter II provides a general overview of the theory of neural networks. A 
graphical approach is employed to demonstrate the ability of neural networks to 
perform nonlinear mapping for various network configurations. The results are then 
related to a theorem by Kolmogorov. The backpropagation method for calculating 
the weights and thresholds of the neural network is also introduced. 

Chapter III deals with the derivation of an alternative algorithm to the back- 
propagation method for calculating the weights and thresholds of a neural network. 
The conjugate gradient optimization method is presented and then applied to the neu- 
ral network model. The Fibonacci line search method used in conjunction with the 
conjugate gradient method is also discussed. The final section of the chapter presents 
details concerning actual implementation of the algorithm to include experimentally 
derived parameters. 

Chapter IV presents the results of the thesis research. The conjugate gradient 
algorithm’s performance is compared to the backpropagation method and is shown 
to be more computationally efficient. A neural network using the conjugate gradi- 
ent algorithin is then applied to three simple examples to validate the performance 
of the new algorithm and to demonstrate the types of tasks that a neural network 
can perform. The first example illustrates the neural network's ability to perform 
classification. A two input neural network is successfully "taught" to differentiate be- 


tween sets of points falling inside and outside a circle. The second example compares 


the performance of a one-step linear predictor to a neural network for a nonlinear 
chaotic time series generated using the Feigenbaum logistic function. ‘This applica- 
tion demonstrates the nonlinear mapping ability of the neural network. The neural 
network predictor is shown to provide much greater accuracy than its linear counter- 
part. The final application presented demonstrates the ability of a neural network to 
perform channel equalization for a nonminimum phase channel. Its performance is 
compared to its linear equivalent and is shown to provide superior performance. 
Chapter V contains the overall conclusions of the thesis research and provides 


recommendations for future research. 


II. FUNDAMENTALS - HOW NEURAL 
NETWORKS WORK 


A. THE BASIC BUILDING BLOCK 
A neural network is a system of relatively simple processing elements whose 
function is determined by its network structure, connection weights, and the transfer 


function of each neuron. Figure 2.1 shows a single artificial neuron, the fundamental 


building block for all neural networks. A set of inputs 2),22.....a@, are applied 
through a set of associated connection weights w;.w2.....w, to the neuron. 
SHIT 





me 


Figure 2.1: A single artificial neuron 


The inputs correspond to the stimulation levels and the weights to the svuap- 
tic strengths of a biological neuron. The neuron sums the weighted inputs, adds a 
threshold value, and applies the result to the neuron’s transfer function f(x). This 


operation can be expressed as 


z= (= wc; ) (211 


or in vector notation 


z = f(wTx +6) (2. 


w 
IN 
— 


where x is a column vector of inputs, w the corresponding column vector of weights, 


and Ó the neuron's threshold value. 


B. THE TRANSFER FUNCTION 

A number of possibilities arise for selection of an appropriate transfer function. 
These include most notably: the signum function, the linear function, and the sigmoid 
function. Initia] research conducted in the 1950’s and 1960’s by Rosenblat, Minsky 
and others used the signum function shown in Figure 2.2 [Ref. 3]. The signum function 


will be used for a preliminary. discussion of how neural networks operate. 





Figure 2.2: Signum function 


Artificial neurons using the signum transfer function were referred to as percep- 
trons [Ref. 3]. The signum transfer function causes the output of the perceptron to 
take one of two discrete values. The point at which the neuron switches froin low to 
high or high to low is determined by the input weights aud tlie perceptrou's thresh- 
old value. It has been shown that a single perceptron has the ability to distinguish 
between two classes of inputs [Ref. 4:p. 13]. This is demonstrated in Figure 2.3 for a 


two input network. 


сл 


The combination of weights (t; and w2) and the offset (0) define a liue where the 
output of the network (z) is high for the class of inputs falling on one side of the line 
and low for the second class of inputs falling on the other side. If there are n inputs 
to a single perceptron, as pictured in Figure 2.1, the perceptron can construct an n 
dimensional hyperplane separating the two classes of inputs. Input classes that cannot 
be separated by a simple hyperplane therefore cannot be accurately differentiated by 
a single perceptron. 

This problem can be remedied by cascading the perceptrons into several layers. 
This type of network topology is called a feedforward network because the output 
from the previous layer is fed forward to only the neurons in the next laver of the 
network. By adding additional layers. more complex boundaries can be defined. A 
two layer network is capable of defining decision regions that are convex or concave 
in shape. For the two input case shown in Figure 2.4, eacli perceptron in the first 
laver defines a boundary line. A single second layer perceptron weights and combines 
the outputs from the first layer perceptrons to produce tlie two decision regions. As 
pictured in Figure 2.4 a two layer network can also define a single enclosed region. 
With the addition of a third layer, disjoint enclosed regions can be combnied to create 
a decision map of any arbitrary complexity. given a sufficient number of perceptrons 
in each layer. This is illustrated in Figure 2.5. 

The performance of a multilayer perceptron network using the signum transfer 
function is satisfactory provided the desired output from the network is limited to two 
discrete values (1.e., high or low). This would be appropriate for a binary classifier 
system, where each output would represent one of two classes, i.e., a binary value. 
It does not, however, provide sufficient resolution for analog (continuously valued) or 
the corresponding discrete valued output functions associated with most other signal 


processing applications. 
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Figure 2.3: Single neuron and associated decision regions 
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Figure 2.4: Two layer network and associated decision regions 
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Figure 2.5: Three layer network and associated decision regions 
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One example of a transfer function that would be capable of providing such a 
continuously variable output is the linear transfer function. In this case, the output of 
the artificial neuron would simply be the weighted sum of the inputs plus the neuron’s 


threshold value. This can be expressed as 
= DE +0 (2.3) 


or in vector notation 
О к, =wľx +0. (2.4) 

This is the transfer function used by Widrow and Hoff in their development of the 
adaptive linear (adaline) and multiple adaptive linear (madaline) filters [Ref. 5:p. 
10]. A great deal has been written concerning research and applications of the 
adaptive linear filter although it las not often been referred to as a neural model 
[Ref. 6],[Ref. 7].[Ref. 8]. One key feature of the linear neural network is that there is 
no functional difference between a multilayer and a single layer network. For example, 
for the simple two laver network in Figure 2.6 the output of the first laver neurons 
can be written as 

Л(а1,23) - ила U + 01 (2.5) 
апа 


[ХЕ i2) Es eus p Were sp 0. (2.6) 


The output of the network cau then be written as 
fa( 21, 22) 2 ws fi(x1, 22) 4 wef2( zi. 22) + #з. (250) 
After some algebraic manipulation and substitution. the final result is’ 


fala. 22) = usw, 3E з) 2F We URL wat ар (w501 sls ر0‎ ot 04). (2.8) 


en 


From the above discussion, it is clear that, regardless of the number of layers in the 
network, the network can always be reduced to a single layer network. Essentially 
then. the linear adaptive filter is nothing more than the linear version of a single laver 
neural network. 

A third transfer function which has been recently popularized by Rumelhart et 


al. [Ref. 9] is called the sigmoid function. It is defined by the equation 


1 
i TT 
where 


у, (2.10) 


The sigmoid function, pictured in Figure 2.7, has a shape which would appear to fall 
somewhere between the linear trausfer function and the signum transfer function. 
Its output is limited to a continuous range of values bet ween zero and one. For values 
of = near zero. the transfer function behaves in a linear fashion with a constant slope 
of one. If the input weights to the neuron are kept sufficiently small and the range 
of input values limited. the sigmoidal artificial neuron can be made to appear linear. 
Likewise. by using large values for the input weights w, the values for = would vary 
more rapidly and the sigmoidal artificial neuron would more closely approximate the 
signum function. As a result. the output of the network can be made to approximate 
both linear and nonlinear conibinations of the inputs depending on the values of the 
network’s weights (w) and thresholds (@). 

A theorem developed by Kolmogorov and described in Reference 10 provides 
further insight into the potential capabilities of a multilaver sigmoidal neural network. 
The theorem states that any continuous function of n variables can be represented 
using only linear sunimations and nonlinear but continuously increasing functions of 


only one variable. This would indicate that a three laver artificial neuron feedforward 


9 





Figure 2.6: Two layer linear network 
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Figure 2.7: Neuron transfer functions 


10 


network using a sigmoidal transfer function is capable of representing any nonlin- 
ear multivariable function. The theorem, however, does not indicate the number of 
neurons required in each layer, or how the values for the weights should be derived. 
It has been suggested that one approach to representing an n-dimensional noi- 
linear function using neural networks might be by a weighted coinbination of n- 
dimensional ‘bumps’ [Ref. 11]. This is somewhat analogous to the Fourier series 
representation of an arbitrary signal where weighted combinations of sinusoids of 
suitable frequencies are used. To see how a nonlinear function might be represented 
using a sigmoidal neural network, let us look at the case where we have a nonlinear 
function of two variables. The output of the nonlinear function could be interpreted 
as a two dimensional surface in a three dimensional space. The output of a single 


sigmoidal neuron would have a surface like that pictured in Figure 2.8. 





; 4 





0.6 


со 0.2 95 AX'S 


Figure 2.8: A sigmoid surface 
The orientation of the rising slope of the sigmoidal surface is determined by the 


ueuron s input weights (w). Its position is determined by its threshold (@) value. 


The height of the surface is controlled by the weight connected to the output of the 


11 


neuron. If we add a second neuron witli the same orientation, but a slightly different 
position than the first by using a different threshold value (0), and use an output 
weight equal to but opposite in sign of the first, we can form a ridge as shown in 


Figure 2.9. 
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Figure 2.9: A ridge 


A second ridge, perpendicular to the first, can then be constructed by adding 
two additional neurons to the first layer and selecting appropriate input weight. values. 
The sum of the two ridges then forins the surface pictured in Figure 2.10. ‘The weights 
connecting the outputs of tlie first layer neurons to the single second layer nenron 
along with the second layer neuron's threshold value can then be adjusted to yield a 
true bump shown in Figure 2.11. 

We can now represent any surface as a combination of these bumps. The network 
topology to accomplish tliis would consist of multiple copies of two layer network and 
a single third layer neuron to weight and sum tlie bumps. The resulting surface is 
pictured in Figure 2.12. The preceding development provides some insight into the 


number of neurons required in each layer of a neural network to adequately represent, 
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Figure 2.12: Multiple bumps 


a given nonlinear function. A given function might be more efficiently represented 
using a combination of sigmoidal surfaces or ridges rather than bumps. The better 
knowledge one has of the function to be represented will lead to a better decision 


concerning the neural network topology required. 


C. CALCULATION OF WEIGHTS AND THRESHOLDS 
The burning question that has yet to be addressed concerning the feedforward 
sigmoidal neural network is how do we calculate the weights (w) and the neuron 
thresholds (0) of the network to yield a satisfactory representation of a given nonlinear 
function. A method called backpropagation. developed by Rumelhart. has proven 
popular and has been demonstrated to work fairly well [Ref. 2]. The backpropagation 
method uses a training data set consisting of sets of inputs aud a desired output value. 
A set of inputs is applied to the neural network and the resulting network output is 
compared to the desired value. The error between the neural network's output and 
the desired output, along with the current state of neural network. is used to modify 
the neural network's weights and threshold values. The state of the neural network is 
defined by the current input to the network. its weights. thresholds. and cach ucuron’s 
transfer function. The backpropagation method attempts to minimize the sum of the 
squared errors over the entire training data set. This can be expressed as 
Ера ои (29) 
t 
where £ is the total squared error, e(t) is the network output error for the + input 
set. y(t) is the desired or target output for the ¢' input set. and z(t) is the actual 
output of the neural net for the t'^ input set. The weights and the thresholds of the 


network are iteratively updated in proportion to the gradient of the total squared 


14 


error, E. This can be expressed as 





w(n +1) = w(n) + Aw(n) = u(n) — EG -€ (2.12). 
апа 
On +1) = Bin) + SO(n) = Bln) - = -€ CAES) 


where w(n) and @(n) are the weights and thresholds at the n'" iteration of the algo- 
rithm, Aw(n) and A6(n) are the incremental changes to the weights and thresholds. 
and e is the proportionality constant [Ref. 2:p. 130]. The backpropagation method 
gets its name from the fact that the error at the output of the network is propagated 
back through the network in the form of gradients in order to update the network’s 
weights and thresholds. 

The backpropagation method is essentially a steepest descent optimization al- 
gorithm which uses the gradient of the squared error function to modify the weights 
and thresholds of the neural network [Ref. 2:p. 127]. One requirement dictated by 
this gradient method is that the transfer function of the neurons be continuously dif- 
ferentiable |Ref. 2:p. 131. Asa result. this method cannot be used with the signum 
transfer function because of its discontinuity. The method. however. does work for 
the linear and sigmoidal transfer function cases. 

As presented above. the weights and thresholds are updated after a complete 
pass of the entire training data set through the network. In the actual implemen- 
tation of the algorithm. however. Rumelhart updates the weights and thresholds of 
the network after each input/desired output pair is applied [Ref. 2:pp. 136-137]. 
His rationale for doing this is that the algorithm converges so slowly that it does 
not affect the overall convergence rate. and that it is more gratifying to update the 
weights and thresholds more frequently [Ref. 2:p. 137]. As Rummelhart indicated. 


the steepest descent method is extremely slow to converge. It was this deficiency that 
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led to the development of this thesis project. Lapedes and Farber indicated that a 
related optimization method, the conjugate gradient algorithm, yielded a significant | 
improvement in the convergence rate of the backpropagation method [Ref. 12]. The 
following chapter will address the development and application of this optimization 


method to a feedforward sigmoidal neural network. 
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III. DERIVATION OF THE ADAPTATION 
ALGORITHM 


A. THE CONJUGATE GRADIENT METHOD 
1. General Description 

The conjugate gradient method is an iterative method for optimizing a 
set of coefficients h in order to minimize a given objective function J(h). It falls 
into the class of optimization methods that apply a multidimensional search using 
derivatives to the optimization problem [Ref. 13:pp. 289-316]. The steepest descent 
method. which Rumelhart uses for adapting the feedforward neural network. is also 
a member of this class [Ref. 2]. This class of optimization methods. called gradient 
methods. treat the objective function J(h) as a multidimensional surface over which 
it iteratively searches for the absolute or global minimum [Ref. 13:pp. 289 316]. The 
coefficients h are the multidimensional coordinates which define where the algorithm 
is located on the surface during any particular iteration. This class of optinization 
methods require that the objective function be differentiable with respect to the 
coefficients h that are adapted [Ref. 13:p. 289]. This partial derivative is called the 
gradient g of the objective function. When evaluated for a given set of coefficients h, 
the gradient g is a multidimensional vector which is tangent to the objective function 
surface at a point defined by the coefficients h. This vector points in the direction of 
greatest increase. The negative of the gradient (—g) logically points downhill in the 
direction of greatest decrease. Thus, the gradient vector g cau provide a direction 
along the surface of the objective function in which to search for the global minimum. 
The advantage of gradient methods is that they decompose the optimization problem 


from a multidimensional search of the objective function surface to a sequence of line 


ІТ 


searches along directions determined by the gradient vector g. 

The method of steepest descent uses the gradient vector g directly to per- 
form its iterative line search of the objective function surface [Ref. 14:pp. 214-220]. 
Rumelhart points out that the steepest descent method works well when the objective 
function surface is quadratic or bowl-shaped with a single global minimum [Ref. 2:p. 
132]. He states, however, that the more complex objective function surfaces associ- 
ated with multilayer neural networks frequently contain many local minima [Ref. 2:p. 
132]. As a result, the steepest descent method can become trapped in one of these 
local minima yielding a less than optimal solution. This is because the magnitude of 
gradient vector decreases as the algorithm approaches a local minimum. The distance 
the steepest descent algorithm travels for a given iteration is a function of a constaut 
times the magnitude of the gradient. Therefore, as the magnitude of the gradient de- 
creases, the distance the algorithm travels along the surface decreases. Compounding 
the problem is the fact that each successive gradient is orthogonal to the previous 
gradient. This causes the algorithm to zigzag in ever smaller steps as it approaches 
the bottom of a local minimum. The result is that the algorithui becomes trapped 
at the bottom of a local minimum and never reaches the optimal point or global 
minimum. Use of a constant stepsize also causes the steepest descent algoritlin to be 
extremely slow to converge [Ref. 13:pp. 290-291]. 

The conjugate gradient approach is motivated by a desire to accelerate 
the convergence rate óf the steepest descent method without greatly increasing the 
complexity of the algorithm. The conjugate gradient method uses a succession of 
direction vectors d, that are conjugate to the gradient vector g, obtained as the 
algorithm progresses. The direction along which the algorithin searches, dj. is a 
linear combination of present and past values of the gradient vector. The result is 


that the gradient vector g, is orthogonal to the subspace F, which is defined by 


=n 
o 


the set of all previous direction vectors do,d,,...,d,.,. Each successive iteration 
essentially adds an additional dimension to the subspace Fp. The distance a, tliat 
the algorithm travels along the line search direction d, also varies for each iteration 
of the algorithm. This makes the method only slightly more complicated than the 
steepest descent method. The algorithm, however, does not become trapped in local 
minima as easily as the steepest descent method and converges steadily to the global 
minimum or optimal set of coefficients h, [Ref. 13:pp. 297-316]. 
2. Notation Summary 


The notation used to describe the conjugate gradient method is as follows: 


J(h) Objective function to be minimized. 
h, Coefficient vector at the &" iteration. 
g. Gradient vector of the objective function at the A! iteration. 
d, Search direction vector at the A! iteration. 
a, Search distance coefficient at the A! iteration. 
3, Deflection coefficient at the k' iteration. 


3. Summary of the Conjugate Gradient Algoritliun 
A summary of the conjugate gradient method for minimizing a differen- 


tiable objective function J(h) is listed below [Ref. 13:p. 306]: 


Step 1. Choose an initial set of coefficients hy. 


Step 2. Calculate the initial gradient go using the definition 


_ الك‎ hy) 
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Step 3. Let the initial direction vector be dy — —go. 
Step 4. Let k=0. 


Step 5. Let ax be the optimal solution to the problem to minimize J(h, + 


o4d,) subject to a, > 0. 


Step 6. Update the new coefficients h4,, using the equation 


hii = h; + adp. (3.2) 
Step 7. Calculate the next gradient vector value gj41 using the new cocffi- 
cients h41- 
Step 8. Calculate the deflection coefficient 3p using the equation 


(£1 = БА) £a 


3 = E (3.3) 
Bi Bk 
Step 9. Update the direction vector dé, using the equation 
des >= OHI FT E e (3.4) 


Step 10. Replace F bv F + T and go to step 5. 


4. Selection of a Line Search Method 
The conjugate gradient method outlined above requires that a searcli dis- 
tance coefficient a, be found that minimizes the objective function J(h, + a;d;) 
subject to ax > 0. This dictates that a line search be performed starting at the point 
in multidimensional space defined by the current coefficient vector hy, aud proceeding 
along the hne defined by the current direction vector d, until the minimum value of 


the objective function is found. The distance the line search algorithm travels from 
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the point h4 to the minimum value of the function is then defined to be the scalar 
value a4. A number of methods have been proposed to perform this line search. 
These include the uniform search, dichotomous search, the golden section method. ' 
and the Fibonacci method [Ref. 13:pp. 253-264]. There is also a class of line search 
methods which use derivatives to assist in finding the minimum value of the objective 
function [Ref. 13:pp. 264-269]. This second group of methods was considered for 
use with the conjugate gradient method but were subsequently rejected due to the 
complexity of calculating and evaluating the required derivatives. The selection of 
an appropriate line search method for use in conjunction with the conjugate gradient 
method was based primarily on efficiency. All of the methods except for the Fibonacci 
search require two evaluations of the objective function during each iteration of the 
algorithm. The Fibonacci method. however. requires only a single evaluation because 
it also uses the results from the previous iteration. Comparison of the line search 
methods mentioned above revealed that the Fibonacci search method is the most 
efficient [Ref. 13:p. 264). As a result, the Fibonacci search method was chosen to be 
used in conjunction with the conjugate gradient method. 

The Fibonacci method performs a search for the minnnum value of a func- 
tion of a single variable over a closed bounded interval [a.b]. The function in this 
case is J(hz + ax,d;) where ax, is the single variable. The interval over which the 
algorithm searches is called the interval of uncertainty and limits the range of values 
for ax,. The lower limit for ax, is given by the conjugate gradient method as zero, 
but the upper limit must be specified before the algorithm can begin. The interval of 
uncertainty is steadily reduced as the algorithin progresses. The number of iterations 
which the algorithm will perform must also be specified before the start of the algo- 


rithm. The Fibonacci method is based on the Fibonacci sequence F, which is defined 


as 


qu = us F Sl (3.5) 
ЕЕ (3.6) 
The resulting sequence is 1,1.2.3,5,8,13,21,34,55,89,.... The Fibonacci search 


method begins by evaluating the objective function at each of two points within the 
interval of uncertainty as shown in figure 3.1. 


These two points, which we will call À; and y,, are calculated using 


reel 








№ = а; + 5 fm (3.1) 
n-—J-Tl 
and 
Le M 
р = а; + Е (ба) (3.5) 
п—]+1 


where Kk is the iteration index of the conjugate gradient algorithm, j is the iteration 
index of the Fibonacci algorithm, [a,,6,] is the current interval of uncertainty. and n 
is the total number of iterations planned. A new interval of uncertainty. [a4 4. 5,4] 
is then selected based on the value of the objective function at the two points À, and 
wy. KE J(hy 4 A,d4) > J(h, + pdx), then the new interval of uncertainty [a44.0,41] 
is given by [A,, 5]. Likewise. if the opposite is true. J(hy + A,;d,) < J(h, + pf, dg), 
then the new interval of uncertainty is [a;, u;]. Both cases are shown in Figure 3.2. 
The key feature that makes the Fibonacci method so attractive is that, for the next 
iteration 7 + 1, either 4,4; = 4, or 4,41 — A5, depending on which new intervaltat 
uncertainty was selected. Since the objective function has already been evaluated at 
the previous values for À, and y,, then only one additional evaluation must be made 
for each succeeding iteration. At the completion of the specified iterations of the 
algorithm, the size of the final interval of uncertainty will be 
(bo — ao) 


b, o. Ec : .9 
( аһ) T. (3.9) 
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Figure 3.1: Initial evaluation points Ay and jg and interval of uncertainty 
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Figure 3.2: Evaluation points 4,4; and y,4; and revised interval of uncer- 
tainty when J(\,) > J(#1,) 
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Figure 3.3: Evaluation points À,,, and ji,; and revised interval of uncer- 
tainty when J(A,) « J(y) 


If we select the midpoint of the final interval of uncertainty as the value ar, to be used 
by the conjugate gradient method, then we can calculate the number of iterations n 
required to achieve a desired accuracy after deciding upon an upper bound by. The 
upper bound and number of iterations used for the neural network problem will be 
presented in the next chapter. 

5. Calculation of the Deflection Coefficient Û, 

The equation used to calculate the deflection constant 7, (equation 3.3) 
is the Polak-Ribiere version of the conjugate gradient method originally proposed by 
Fletcher and Reeves [Ref. 14:p. 253]. The original method used the equation 

zn 
TES (3.10) 
8; Бк 

to calculate the deflection constant 34. The two equations are equivalent 1f the ob- 
jective function to be minimized is quadratic. Experimental results. however. tend 
to indicate that the Polak-Ribiere method is more effective for nonquadratic objec- 
tive functions [Ref. 11:p. 251]. This is because the Polak-Ribiere method tends to 
reset the the direction vector d,4, to the value of the gradient vector g,4, when 
two successive gradients g and g44, are equal. This has the effect of beginning the 
conjugate gradient method anew. using the present coefficients vector hy, as the new 


initial coefficient vector ho. 


B. APPLYING THE CONJUGATE GRADIENT METHOD TO A NEU- 
RAL NETWORK 
1. The Neural Network Model and Notation 
The generic neural network model to be used for the purposes of discussion 
is pictured in Figure 3.4. The notation used when referring to the various variables 


of the model is as follows: 


То» 





Figure 3.4: Neural network model 
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ту The j“ input to the i't layer of the network. For other than the inputs 


Tor, Tor, ---, Toi, the variable x;; is also the output of the j™ neuron in the 
(i — 1)" layer and is a function of the previous layer's inputs and weights and 
the j'* neuron's threshold value. 


wig The weight in the i" layer of the network that connects the j^ input ШЕ 
k'^ neuron of the layer. 


0, The threshold value associated with the kt» neuron of the it} laver of neurons. 


y The desired output value of the network for a given set of inputs 29), Zo2..-.. Tor. 


f(-) The transfer function of the neuron. 


2. The Neural Network Objective Function J(h) 
As was mentioned in the previous chapter. we wish to nininize tlie total 
sum of the squared errors over an entire training data set. As a result. the objective 


function J(h) to be minimized using the conjugate gradient method is 
EC SEE (3.11) 
t 2 | P 


where e(t) is the error between the actual and tlie desired outputs of the neural 
network for the даға set. 
3. The Adaptation Coefficients h 

There are two quantities that we wish to adapt in order for the neural 
network to consistently produce the desired output for a given input. These two 
quantities are the connection weights w,;; of the network and the threshold values O, 
associated with each neuron in the network. Together. these two sets of coefficients 
form the coefficient vector h. The conjugate gradient algorithm uses a single vector 
h to represent the coefficients which are adapted to minimize the objective function 
J(h). The notation used for the neural network model. however. reflects the use of 
matrices [wij,] for the weights and vectors [0,4] for the thresholds. This was done to 


simplify the identification of the various weights and thresholds. We must therefore 
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combine and transform the weight matrices and threshold vectors into a single vector 
h in order to apply the conjugate gradient algorithm. This is done by assigning the 


individual weights and thresholds to a vector as shown in equation 3.12. 
1 A 
h= [wor Wor2 SCT Шот; Uii 35 09,002, . .., 05] (3.12) 


We can perform the conjugate gradient algorithm using the vector notation and then 
perform a reverse transformation at the completion of the algorithm to assign the 
final weights and threshold values to the neural network. 
4. The Gradient Vector g 
The gradient vector g used by the conjugate gradient method is defined as 


2 


= 5 ih), (3.13) 
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The gradient vector g for the neural network problem consists of the gradients asso- 
ciated with the weights and thresholds of the neural network. The gradient vector g 


would be of the form 


n 


1 қ 
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The gradient for any particular weight or threshold of the network is calculated by 
taking the partial derivative of the error function E. with respect to the particular 
weight (w,,) or threshold (6,,). For the gradient associated with a weight this would 


be expressed as 








ОШ 1 д 8 
= =e (7 DNE 
d л | ee 
and for the gradient associated with a threshold as 
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The partial derivative in equations 3.15 and 3.16 can be moved inside the respective 


summation terms resulting in the following expressions 





дЕ 1 oE 
о = 1 e 
Gg ml (3.17) 
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The gradient for each weight w;;, can therefore be expressed as the sum of the partial 
gradients 

док = 2 9it) (3.19) 

t 

where the partial gradient g;,, (t) is the gradient associated with the weight w, when 
evaluated for a single set of training data rather than the entire training data set. The 
gradients associated with the threshold values of the neural network can be expressed 
in a similar manner. given by 


WO EUM (3.20) 
t 

For the purposes of notational brevity, we will assume that the traiing data set 
consists of only one set inputs and the associated desired output. This will allow 
us reduce the length of equations for the gradient by removing references to the 
particular element of the training set used. The reader should remember. however. 
that if there are s pairs of data in the training set, then the gradient is the sum of 

the s partial gradients as expressed 1n equation 3.19 and equation 3.20. 

a. Neuron Transfer Function Derivative 

Before delving into the derivation of the equations for the gradients 
of the weights and thresholds of tlie neural network. a few comments should be made 
concerning the transfer function used for the neural network niodel and its derivative. 


The transfer function to be used is the sigmoid function defined by equation 2.9 
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in Chapter 2. A key feature of the sigmoidal function is that its derivative can be 


expressed in terms of its original value by 
Ge 29 
s Sie) = f(z) (1 = fle), (3.21) 
The derivative of a neuron’s output can thus be expressed as a function of the output 
of the neuron and the partial derivative of the neuron’s inputs. The partial derivative 


of the neuron's output with respect to w,,, is then given by 


Or, Kk д | 
et (TEL = din, JA И С i = m 3 (3:22) 
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for the derivative with respect to 4,4. Equations 3.22 and 3.23 will be used frequently 
to evaluate the partial derivatives of each neuron’s output when deriving the equations 
for the gradients of the neural network. 
b. Calculation of the Third Layer Gradient 

The calculation of the gradients for each weight aud threshold of the 
neural network begins at the output of the neural network where the difference be- 
tween the actual network output and the desired output produces an error. This 
error is propagated back through the network in the form of gradients. The gradient 
associated with the output weight ws can be expressed as 


DM em 


2 ә: 
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where w373 is the output of the network and y is the desired output value. Taking 
the partial derivative yields 


д 


диз 3) 2 (y — wars) (72) (3.25) 





ga = (y — 1333) 


After rearranging the terms of equation 3.25, the final form for the output weight’s 
gradient g4 becomes 

g3 = (wsts — y) t. (3.26) 

c. Calculation of the Second Layer Gradients 

Derivation of the input, first and second layer gradients 1s somewhat 

more involved than that of the third layer gradients because of the multiple neurons 

and weights between the error at the output and the gradient for which we are deriving 

an expression. The gradient equation for a weight in the second layer can be expressed 


as 
OF 


ди», 


Of the terms evaluated by the partial derivative. only the output of the third layer 








д ; - 
2. \ = Шзіз) Fa (Ш- шала)! (32210 


2) 

neuron 23 1s affected by a variation of the second layer weight w2,. The desired output 
y can be eliminated and the partial derivative shifted to the right of the output weight 
term ws. This yields the expression 


0 ; 
Gan lg Ull Wl OC (3.23) 


Ow, 
We can replace the partial derivative term in equation 3.28 with an equivalent ex- 
pression that can be evaluated with respect to w, using equation 3.22. This results 


in the following expression 





9 0 2 E (1 — a3) D 
d E 
93 


(^ -5 cts) ! (3.29) 
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Comparing the first part of equation 3.29 with equation 3.26. we find that we can 
replace the first two terms of equation 3.29 with the output weight s gradient gs. 
After taking the partial derivative. only one term, 2? 


y- remains. The equation for the 


second layer weight gradient becomes 


2255: ga es Ul easy ( E ou TT E (3.30) 
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We can see from equation 3.30 that the gradient g2, is a function of weight wa that 
connects the neuron’s output to the next laver. the gradient g3 that is associated with 
the output weight, the neuron's output value x3, and the input xz; that is applied to 
the weight for which we are calculating the gradient (g2,). This relationship between 
the inputs, outputs, weights and gradients will be found to be consistent for each of 
the gradients of the neural network. 

Rather than starting from scratch to derive the equation for the gra- 
dient associated with the output neuron's threshold 02, we begin at the point where 
evaluation of the partial derivative with respect to 62 differs from that for the weight 


gradient g2,. The equation for the gradient of the output neuron’s threshold becomes 


д 
90, = 93 (= ш) (1 = 13) 90; (0 = DE 2? Я qoo 
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Evaluation of the partial derivative vields a constant of one since none of the sum- 
mation terms is a function of the threshold value 0,. Shifting the sign term. the final 
form for the gradient is 


fü — 4303 (1 — 23). (3.32) 


Note that the equation for the gradient of the neuron's threshold value 0; (equation 

3.32) has the same form as that for the input weights ws, connected to the output 

neuron (equation 3.29) except for the input term 12,. We can treat the threshold 

value as a weight if we assume that the threshold ‘weight’ has a constant input of —1. 
d. Calculation of the First Layer Gradients 

The derivation of the equation for the gradient of the first laver weights 

follows in a similar fashion to that of the second laver. We begin at the point where 


evaluation of the partial derivative differs (equation 3.29). The equation for the first 


layer weight gradient becomes 


дЕ 
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— gs (—ws) (1 — z3) = (^. WE vata) (3.33) 
Only the output of the k'” neuron in the second layer (x2,) is affected by the value of 
the weight wi; of the first layer. Therefore all terms except for the k'" term of the 
summation in equation 3.33 are zero when thie partial derivative is taken. This vields 


the expression 


д ET 
ijk = — 9303 (1 == 23) (-ш2.) Ü Jk. (3.34) 
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Using equation 3.22 we can rewrite equation 3.34 as 
д эй 
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The first part of equation 3.35 can be replaced with the gradient go, using equation 
3.30. Only the j'^ term of the summation under evaluation by the partial derivative 
with respect to wi is nonzero. The equation for the first layer gradients of the 


weights then becomes 
uk = E (3.36) 
which wheu rearranged vields 
ik 7 gaxway (1 — vac) 31. (3.31) 


Again, the present layers's gradient is a function of the next layer's gradients and 
weights, the present layer's neuron output values, and the input to the present laver. 

The derivation of the equation for the gradient associated with the 
neuron thresholds of the first layer follows in the same manner as that of the second 
layer. The equation for the threshold gradients 6), of the first layer can be expressed 


as 


К = Gok (ЕЕ Е в 90, L fon — Ў и 2 (3.38) 
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Evaluating the partial derivative results in the final equation 


90 = wa (l— 22) (3.39) 


which has the same form as equation 3.32. 
e. Calculation of the Input Layer Gradients 
Derivation of the input layer's gradient equation differs only slightly 
from the previous development. The difference is due to the fact that a variation 1n 
the value of a weight in the input layer affects the output of more than a single neuron 
in the next laver of the network. This means that we must retain a summation term 
throughout the calculation of the first layer's gradient equation. The gradient for the 
first layer weight can be expressed as 
ес а Q ауд 2 | (3.40) 
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The threshold 0, is not a function of the input layer's weights and is eliminated when 
its partial derivative is taken with respect to the input weight wu. The other terms 
under evaluation by the partial derivative (1e, z2,) are all. however. a function of 
the input weight wo,,. The partial derivative can be moved inside the summation 


resulting in 
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Shifting the summation to the far left and evaluating the partial derivative using 
equation 3.22 yields 
д er 
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Тһе 0,, term in equation 3.42 can be eliminated since it is not a function of woje- 

The remaining terins can then be rearranged to produce 
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The value g2, can be substituted for the first part of equation 3.43 using equation 
3.30. Also. the output of the &" neuron of the input layer, a),, is a function of the 
input weight wo. As a result, evaluating the partial derivative using equation 3.22 


results in the equation 
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Evaluating the partial derivative in equation 3.44 with respect to wo,, we find that 


only the j** term of the summation is nonzero. Rearranging the terms yields 


ES X gapwap (1 — ap) Fin Wrep (1 — 214) 205. (3.45) 
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Finally, we can replace the first four terms of equation 3.45 with the value ір using 
equation 3.37. This results in the equation for the gradients of the weights of the 


input layer of the network 


дож = p (1 — xj) zo (3.46) 
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Using the same reasoning used to derive equations 3.32 and 3.39 we can express the 


equation for the gradient of the input layer neuron thresholds as 


Joo, = (x su) eroe (3.41) 
p 

Derivation of the equations for the gradients associated with the weights and thresh- 
olds of the neural network is now complete. What we have found 1s that the gradients 
for any particular layer of the network can be expressed as a function of the given 
laver's weights, thresholds, inputs, outputs. and the following laver’s gradients. It 1s 
not necessary to begin at the output of the network and use the network output error 
e(t) to calculate the gradient for a particular weight or threshold which is several lay- 
ers back in the network. The above expressions for the gradient do. however, dictate 
that the gradient calculations begin at the output of the network and the gradients 


be propagated back through the network. 
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5. Fibonacci Line Search Parameters 
Several parameters associated with the Fibonacci line search methods must 
be specified before the conjugate gradient algorithm described in this chapter can be 


applied. These parameters are: 
e The initial size of the interval of uncertainty 


e The number of iterations that the line search should perform. 


The Fibonacci line search attempts to find the best stepsize (ax) in which to step 
along the error function surface towards the global minimum in a direction defined 
by the direction vector (d,). The initial interval of uncertainty is the interval over 
which the algorithm will searcli for the optimal stepsize (a4). The initial interval, 
therefore. establishes the minimum and maximuni stepsize values. Our goal is to find 
the optimal set of weights and thresholds by moving steadily down the error function 
surface towards the global minimum. The lower bound of the interval. or minimum 
stepsize value. is therefore zero since a negative value would move the algoritlin up the 
error function surface in a direction opposite the direction vector (d). Selection of an 
upper bound for the interval entails a number of tradeoffs. A larger maximuin value 
would allow the algorithm to search over a greater interval for the optimal stepsize 
lar). This could allow the conjugate gradient algorithm to converge to the global 
minimum more quickly by enabling it to step farther down the error function surface 
at each iteration of the algorithm. It could also possibly provide niore protection 
against being trapped in a local minimum by allowing the line search algorithin to 
search beyond the confines of a local minimum. A larger interval, however. requires 
that a greater number of iterations be performed to reduce the interval of uncertainty 
to the required degree. This final interval of uncertainty must be small so that 
midpoint of the interval is reasonably close to the optimal stepsize value. It is this 


midpoint that is the stepsize value a, that will be used bv the conjugate gradient 
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algorithm to update the weights and thresholds of the neural network. A larger final 
interval of uncertainty increases the chances of a less than optimal choice for the final 
stepsize. A balance must therefore be struck between the size of the initial interval of 
uncertainty, the size of the final interval of uncertainty, and tlie number of iterations 
to be performed. 

Initial investigations were performed to determine the range of stepsize 
values that were typical for various neural network applications. It was found that 
the stepsize (ax) generally did not exceed a value of 10.0 and was typically less than 
1.0. An initial interval of uncertainty of 10.0 was therefore used throughout remainder 
of the thesis research. 

In the course of determining the initial interval of uncertainty it was found 
that the line search method would occasionally yield a final step size value (a, ) which 
produced an error function value much greater than the previous iteration's value. It 
was determined that tliis problem was a result of the error function surface not being 
unimodal in the direction (d,) along which the algorithm searched for the minimun. 
If this second minimum was closer to one of the two evaluation points (A; and y) 
than the true minimum. as shown in figure 3.5, then the algorithin would converge to 
this second minimum. This would result in an error function value larger than when 
the line search algorithm started. To remedy this problem. the initial interval of 
uncertainty was shifted to the left so that the first point evaluated was for Ay = 0. If 
the error function for the final stepsize value (a,) was greater than the error function 
value with a stepsize of zero. then a stepsize of zero was returned as the final stepsize 
value (a). This had the effect of resetting the conjugate gradient algorithm. А 
stepsize of zero caused the algorithm to retain the same weights and thresholds for 
the next iteration of the algorithm. As a result. the gradient (g,4,) at the next 


iteration was identical to the previous gradient (g,) and the two successive identical 
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gradients would produce a deflection coefficient (4) equal to zero. This would resect 
the direction vector (dą) to the value of the present gradient (gi) rather than the 
weighted sum of previous gradients. This had the effect of reinitializing the conjugate 


gradient method, but at a new starting point (hj) on the error function surface. 





Figure 3.5: Line profile of the error function surface 


Having fixed tlie initial interval of uncertainty, the number of iterations of 
the line search algorithm performed during each iteration of the conjugate gradient 
method was varied to determine an optimal nuniber. Using sixteen iterations, the 
conjugate gradient algorithm was able to consistently reduce the value of the error 
function. The value of the error function did not consistently drop when fewer than 
sixteen iterations were used. Using equation 3.9 this resulted in a final interval of 


uncertainty of 0.00626. 
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C. COMPUTER PROGRAM IMPLEMENTATION 
1. Conjugate Gradient Algorithm 

The conjugate gradient algorithm was implemented for a multiple input, 
single output neural network using the C programming language. A flow chart show- 
ing the basic functions that are performed by the program is shown in figure 3.6. The 
user is prompted at the start of the program for the number of neurons in each stage 
of the neural network, the number of iterations of the conjugate gradient algorithm 
that should be performed, and the name of the input file that contains the training 
data that the algorithm will use to adapt the weights and thresholds of the network. 
The number of neurons allowed in the network is limited to a total of 50 and the 
number of weights connecting the neurons is limited to 500. This maximum number 
of neurons and weights was more than large enough for the various problems to which 
the program was applied. The training data file consists of columns of data in which 
each column is associated with an input to the neural network except for the the last 
column. The last column is the desired output of the neural network. Each row is a 
separate training data set. Upon completion of the program three files are produced. 
The first is a file that contains the final results. The first column of the file is the 
desired value and the second column is the value that the neural network produced 
using the final weights and thresholds of the network. If the algorithm lias performed 
as expected and reduced the error function to a small value. then the two columns of 
data should be nearly identical. The second output file produced contains the final 
weights and thresholds of the network. This file can then be used by any other pro- 
gram which simulates the operation of a neural network with the sanie configuration 
of neurons. The final file is produced only if the neural network has two inputs. The 
file consists of a 21 x 21 matrix of neural network output values that were produced 


bv applving a sequence of twenty-one evenly spaced values between 0.0 aud 1.0 to 
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each of the two inputs. The resulting file can be used to produce a three dimensional 
mesh of the output surface of the neural network. Examples of the input screen, 
output screen, and both the input and output files are contained in Appendix A. A’ 
copy of the C program source code is contained in Appendix B. 
2. Backpropagation Algorithm 
In order to evaluate the conjugate gradient algorithm's performance. the 

backpropagation method was also implemented. The basic flow chart for the back- 
propagation method is shown in figure 3.7. Because of the similarity between the con- 
jugate gradient method and the backpropagation methods, this required only a few 
changes to the program that implemented the conjugate gradient algorithm. These 
changes consisted of 

e Replacing the stepsize value (ax) calculated by the Fibonacci line search with a 


user specified constant referred to as the learning rate by the backpropagation 
method. 


e Replacing the deflection coefficient (4) which is calculated for every iteration 
of the algorithm with a user-specified constant referred to as the momentum 
factor by the backpropagation method. 

e Updating the weights and thresholds of the neural network after the application 
of each training data set rather than upon completion of a complete pass through 
the entire training data file. 

The input and output tiles remain the same as those for the conjugate gradient version 
of the program. 
The following chapter compares the performance of the conjugate gradient 


and backpropagation algorithms and also presents the results of several neural network 


applications. 
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Figure 3.6: Conjugate gradient algorithm flowchart 
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Figure 3.7: Backpropagation algorithm flowchart 
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IV. RESULTS 


In this chapter, the results of the research conducted on neural networks using 
the conjugate gradient method are presented. The chapter is divided into two parts. 
The first concerns the performance of the conjugate gradient algorithm compared 
to that of the backpropagation method. The second provides several examples of 
neural network applications. Where possible, the performance of the neural network 


is compared to its linear counterpart. 


A. CONJUGATE GRADIENT ALGORITHM PERFORMANCE 
1. Performance Measures 

The rationale for implementing the conjugate gradient algorithm was to 
develop an alternative to the backpropagation method that would converge more 
quickly to the optimal set of weights and thresholds for a given problem. ‘The error 
function (£’) is a measure of whether the weights and thresholds of à neural network 
are optimum when applied to a particular problem. The smaller the error function 
value, the more nearly optimum the weights and thresholds are. Both algorithins 
reduce the value of the error function by iteratively adapting the weiglits and thresh- 
olds of the neural network. The rate at which the backpropagation and conjugate 
gradient algorithms converge to the optimal set of weights and thresholds can be 
measured using several methods. The simplest approach would be to determine the 
number of iterations each algorithm requires to reduce the value of the error function 
to a prescribed level. The number of iterations for each algorithm would then be 
compared and the algorithm requiring fewer iterations would be considered to con- 


verge more quickly. This approach does not. however. take into account the greater 
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computational complexity of the conjugate gradient method. A more accurate mea- 
sure of performance for the purposes of comparison is the number of multiplications 
performed by each algorithm. This measure better reflects the relative computational 
requirements of the two algorithms. The number of multiplications performed by each 
of the methods over one iteration is fixed. We can therefore calculate a multiplication 
ratio of the two methods and then use this ratio in conjunction with the number of 
iterations to compare their relative performance. 
2. Calculation of the Multiplication Ratio 

The number of multiplications performed by both the backpropagation 
method and the conjugate gradient method over one iteration is a function of several 
variables. These include the number of neurons and weights in the network. the size 
of the training data file used to train the network, and the number of iterations per- 
formed by the Fibonacci line search method. Tables 4.1 and 4.2 show the number of 
multiplications required by various functions of the conjugate gradient and backprop- 
agation method, respectively. The tables also show the total number of times each 
function is performed during à single iteration of the algorithm. The variable Tis the 
number of training data sets used to train the network, the variable P is the number 
of weights and thresholds in the network. and RF is the number of neurons in the 
network. Table 4.1 figures reflect that the step size (a;) is calculated using sixteen 
iterations of the Fibonacci line search algorithm. The total number of multiplications 


(M) performed bv each of the algorithms is therefore 


Me TOUR STETIT зэ (4.1) 


for the conjugate gradient method and 


Map = T(5P + 5R) 
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TABLE 4.1: MULTIPLICATIONS - CONJUGATE GRADIENT 
METHOD 









Number of 
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TABLE 4.2: MULTIPLICATIONS - BACKPROPAGATION METHOD 
Number of 


Function Times 
Performed | multiplies 


Calculate network output 
Calculate gradient vector 
рада | T 




















Update direction vector (7 SS 
Update weight vector 


44 


for the backpropagation method. We can then derive the multiplication ratio by 


dividing Meg by Mpp to obtain 


Мес  T(20P +3TR + 17) + 21(P + R) +35 
RATIO = ——= = —— 4.3 
Мвр Т(5Р + 5R) ( ) 
Equation 4.3 can then be factored into four terms as shown below 
ә 
ТП E 2 (4.4) 


ETRA om, STGP ah 
For the purposes of approximation, the last two terms of equation 4.4 can be elim- 
inated since the number of training data sets used to train the neural network is 
typically large. As the number of neurons in a network is increased, the number of 
connections or weights in the network increases at a much greater rate. This happens 
because each neuron in a given layer is connected to every neuron in the next layer 
of the network. As a result. the second term of equation 4.3 steadily decreases as 
the number of neurons is increased. The lower bound on the multiplication ratio is 
therefore approximately four and the upper bound can be set at approximately five 
for networks having more than just a few neurons. 
3. Performance Results 

The performance of the conjugate gradient method was compared to the 
performance of the backpropagation method using two different training probleins. 
The first consisted of training the neural network to produce a binary output of either 
one or zero depending on the inputs to the network. The second problem involved 
training the neural network to produce a specific value within the range of zero to 
one for a given set of inputs to the network. 

A plot of the normalized value error function versus the number of itera- 
tions performed for the binary problem is pictured in Figure 4.1 for the backpropa- 


gation algorithm and in Figure 4.2 for the conjugate gradient algorithm. Note the 
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difference in the horizontal scale of the two figures. The error function steadily de- 
creased for the conjugate gradient method while the error function actually increased 
for approximately the first 100 iterations of the backpropagation algorithm. Also 
note that the error function’s rate of change was much more even for the conjugate 
gradient algorithm than for the backpropagation method. 

In order to compare the relative performance of the two algorithms, the 
multiplication ratio’s upper bound of five was used. Pictured in Figure 4.3 is a 
comparison of the two algorithms’ convergence rates with respect to the approximate 
number of multiplications performed by each algorithm. As can be seen, for the binary 
case, the conjugate gradient method consistently outperformed the backpropagation 
method for any given number of multiplications performed. 

The results were even more apparent for the continuous output problem. 
The backpropagation method was unable to significantly reduce the error functions 
value for the first 500 iterations of the algorithm as is shown in Figure 4.4. The conju- 
gate gradient method, however, steadily reduced the value of the error function value 
after each iteration of the algorithm (Figure 4.5). Comparison of the convergence 
rates of the two methods with respect to the number of multiplications required in 
each case is shown in Figure 4.6. For any given number of multiplications the conju- 
gate gradient method greatly outperformed the backpropagation method. 

The conclusion from the two examples above is that the conjugate gradieut 
method performs as well or better than the backpropagation method with respect to 
both the number of iterations and the number of multiplications required to reduce 
the error function to a desired level. The conjugate gradient method therefore satisfies 
one goal of this thesis which was to develop an alternative to the backpropagation 
method that would converge more quickly to the optimal set of weights and thresholds 


for any given problem. 
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Figure 4.1: Binary problem - backpropagation 
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Figure 4.2: Binary problem - conjugate gradient 
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Figure 4.3: Binary problem - comparison 
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Figure 4.4: Continuous problem - backpropagation 
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Figure 4.5: Continuous preblem - conjugate gradient 
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Figure 4.6: Continuous problem - comparison 
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B. NEURAL NETWORK APPLICATION RESULTS 
Several simple applications were chosen to evaluate the performance of the con- 
jugate gradient method vis-a-vis the backpropagation method. These applications 
were also used to develop a better understanding of the potential signal processing 
applications for the neural network. When possible, the neural network’s performance 
was compared to its linear counterpart. 
1. A Classification Problem 

The goal for this problem was to train a neural network to differentiate 
between two classes of inputs. The two classes of inputs consisted of points which 
fell either inside or outside of a circle with a diameter of 0.5 centered within in a 
unit square as shown in Figure 4.7. This classification problem. although relatively 
simple. is representative of one of the primary tasks to which neural networks have 
been applied pattern recognition and classification [Ref. l:pp. 66 67]. 

The points used to coustruct the training data file were evenly spaced 0.1 
apart from zero to one for both the Xo and .X, coordinates as shown in Figure 4.7. 
This produced a total of 121 points over tle unit square. The training data file 
was Ке сей of 121 data sets, each set consisting of the coordinates for one of tlie 
training points and a value representing the desired class to which tlie point belonged. 
The desired value for a point falling inside the circle was a one. The desired value 
for a point falling outside the circle was a zero. The conjugate gradient algorithm 
was used to train a neural network which had two inputs, eight first layer neurons, 
four second laver neurons, and one output neuron (a 2-8-4-1 configuration). After 
100 iterations of the algorithm, the total squared error summed over the entire 121 
training data sets was 6.26 x 107?. The resulting output of the neural network as a 


function of its inputs Is pictured in Figure 4.8. 
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Figure 4.7: Training data for the classification problem 


The neural network produced output values ranging from 1.56 x 107° to 
1.0 and was able to properly identify the class to which each of the training data 
points belonged. The contour plot of the neural network output for a single contour 
value of 0.5 is shown in Figure 4.9. The plot clearly shows that the conjugate gradient 
algorithm was able to calculate a set of weights and thresholds for the neural network 
that very closely approximates the desired result. A circular decision region was 
formed that allowed the neural network to differentiate between points falling inside 
the circle and points falling outside the circle. This is because a neural network, due 
to its nonlinearity. has the ability to form arbitrarily complex decision regions. 

This simple example clearly demonstrates the ability of a neural network 
to produce a nonlinear mapping of a set of analog inputs to a single binary output 
value. In this case. this nonlinear mapping was used to produce tlie two decision 
regions pictured in Figure 4.9. For other applications. the formation of decision 
regions may not be called tor. Rather. the output of the network may have to be 
continuously variable. 

2. Nonlinear Time Series Prediction 

The previous problem required the neural network to produce only a binary 
output of one or zero. The second application was selected so that the conjugate 
gradient algorithm's performance could be evaluated for the case of a continuously 
variable range of desired output values. This type of application falls into a second 
class of tasks for which the neural network can be applied— nonlinear napping of a 
set of analog inputs to an analog output value [Ref. 1:p. 67]. It was decided to apply 
the neural network to the problem of one-step prediction of a nonlinear time series. 
One-step prediction is a fairly common application in digital signal processing. A 
nonlinear time series was used since one-step prediction for a linear time series could 


easily be satisfied using a luear filter rather than a neural network. The method 


53 





Figure 4.8: Neural network output versus input 
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Figure 4.9: Neural network output contour plot 
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used to perform the prediction is similar to that used by a linear predictor. The 
next value in the series is predicted using the previous values of the series. The basic 


configuration is pictured in Figure 4.10. 





Predictor 


Figure 4.10: Time series predictor 


For a linear predictor the output of the predictor is merely a weighted sum 
of a given number of previous values of the series. The neural network. however. can 
produce an output which is a nonlinear function of a given number of previous values. 
The nonlinear time series used to train and evaluate the conjugate gradient algorithm 
was produced using 
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This equation is referred to as the classic logistic or Feigenbaum map and has been 
studied quite extensively because its simplicity aud its application to chaos theory. 
This iterated equation (equation 4.5) produces an ergodic. chaotic time series that is 
bounded and quasi-periodic [Ref. 12:p. 10]. A training sequence of 100 samples was 
generated using equation 4.5 with the variable B equal to 1.0. This sequence was 
then used to adaptively calculate the optimal coefficients for a linear second order 


prediction filter using a recursive least squares method. The linear predietor’s results 


et 
At 


are pictured in Figure 4.11. Only the first fifty samples of the sequence were plotted 
so that the two curves on the graph could be better differentiated. Jt is obvious 
from Figure 4.11 that the linear predictor was unable to accurately predict the next 
value in the nonlinear series using the two previous values of the series. When the 
difference between the the actual and predicted signals is plotted one can see that the 
magnitude of the error is almost as great as the magnitude of the original signal (see 
Figure 4.12). As was expected, the linear predictor performs poorly for a nonlinear 
problem. 

The same training sequence was then used by the conjugate gradient al- 
gorithm to train a neural network with a 2-4-2-1 configuration. The network was 
trained to predict the next value of the series based on the two previous values. Af- 
ter 100 iterations, the sum of the squared errors over the 100 training data sets was 
7.25 х 10-3. This would equate to an average standard deviation from the actual sig- 
nal of approximately 8.51 x 107°. The neural network's results are pictured in Figure 
4.13. It is apparent that the neural network performed much better than the linear 
predictor. The prediction error for the neural network is pictured in Figure 4.14. The 
magnitude of the neural network's prediction error is much smaller than that for the 
linear predictor. This error could also be reduced even further if additional iterations 
of the conjugate gradient were performed. 

This example demonstrates that a neural network is quite capable of per- 
forming nonlinear mapping of a set of analog inputs to an analog output. The neural 
network can also produce more accurate results than the linear approach when the 
problem to be solved is nonlinear. It must be recognized, however, that although 
the neural network produces more accurate results, it is much more computationally 


complex than the linear approach to the problem. 
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Figure 4.11: Linear predictor output and actual signal 





Figure 4.12: Linear prediction error 
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Figure 4.13: Neural network predicted and actual signal 
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Figure 4.14: Neural network prediction error 
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3. Channel Equalization 
One final example will serve to demonstrate the potential applications for 
the neural network. The idea of using a neural network to perform channel equaliza- 
tion for a nonminimum phase transmission channel was borrowed from Gibson, Siu, 
and Cowan [Ref. 15]. The experimental results indicate that a neural network could 
potentially provide superior performance to its linear counterpart when the channel 
over which the digital data is transmitted is nonminimum phase. 
a. Transmission Channel Model and Equalizer model 
When digital data is transmitted. it frequently becomes distorted bv 
the channel over which it travels. This distortion can frequently be modeled using 
a linear time invariant (LTI) system [Ref. 8:p. 426]. The channel model. shown in 


Figure 4.15. consists of tlie transfer function H(z) and a channel noise termi m, Fhe 


channel 









Ф Yi 
noise 
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Figure 4.15: Channel model and equalizer 
transfer function of the channel is defined by a finite impulse response (FIR) equation 
E шш. (4.6) 


The channel noise tern 5n, is typically assumed to be zero mean. additive white 
gaussian noise. [he purpose of a channel equalizer also shown in Figure 4.15 is to 


reverse the distorting effects of the channel and to recover the original signal (2,) using 
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m samples of the received signal, y;,¥:-1,-.-,;Yi-m+1. If we assume, for a moment, 
that the noise term (nj) is zero, then the received signal y; is merely a weighted sum 


of the present and past values of the original signal z;. Tliüs can be expressed as 


k 
Uf = Doi (4.7) 
ЕТ 


where a; are the k +1 coefficients associated with the channel transfer function H(z). 
For a binary signal(+1), therefore, the received signal y, cau assume only one of 2* 
possible values. If we then try to estimate the original signal z; using an m sample 
vector [y:, Yi-1....,Yi-m+1]. We can only form a fixed number of permutations of the 
received signal vector. Each received signal vector [y,, y,-1,---,Y:-m+1] belongs to 
either the set of vectors corresponding to a transmitted binary one (+1) or the set of 
vectors corresponding to a transnitted binary zero (—1). [he channel equalizer pro- 
duces an estimate of the transmitted signal x, by determining which set the received 
signal vector belongs to. It has been showu that a linear transversal equalizer can 
perform such an operation if the channel transfer function H(z) is minnnwn phase 
[Ref. 15:p. 1184]. If the channel transfer function is not minimum phase. then the two 
received vector sets are not linearly separable and a linear equalizer cannot accurately 
estimate х, based on the received data vector set [y,.y,_1..... Yu]. I a dela" 


h iteration the equalizer 


is introduced in tlie calculation of z,. such that the at the :' 
estimates the original signal 2;_4, then accurate estimation of the original signal can 
be achieved [Ref. 15:p. 1184]. This value for d however, may not be known. or may 
vary with time. The result is that a linear transversal equalizer. even with a delay. 
may not be able to satisfactorily equalize a nonminimum pliase channel. 
b. A Nonminimum Phase Channel Equalizer 
The ability of a neural network to form arbitrary decision regions. 


demonstrated in Chapter II. could possibly remedy this problem. To investigate this 
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concept, the first order nonminimum phase transfer function (H(z) = 0.5 4- z^!) was 
used to evaluate the performance of both a neural network and a linear transversal 
equalizer. The possible values for y; using this transfer function are: +1.5, +0.5, —0.5, 
and —1.5. A two input neural network and two input linear transversal equalizer were 
used since the channel's transfer function was only first order, and this allowed a 
graphical analysis of the problem. The eight possible combinations of y; and у;—1 аге 
shown in Figure 4.16. The symbol x indicates that the original signal x; had a value 
of —1 and the symbol o indicates that x; was equal to +1. Notice that the symbols 
are intermixed such that no single line can be drawn that will completely separate tlie 
two classes of symbols. This is what makes the nonminimun i phase case intractable 
for the linear transversal equalizer. If the noise term. n,. is now incorporated into 
the problem, the result is as shown in Figure 4.17 for a signal-to-noise ratio (SNR) 
of 10 dB. The number of possible values for y, becomes infinite. but the points are 
distributed about the original eight points shown in Figure 4.16. The coefficients 
for a first order linear transversal equalizer were calculated by applying a recursive 
least squares (RLS) algorithm to the set of 500 consecutive values of y, pictured in 
Figure 4.17. The values for y, were generated by using a random sequence of +1 and 
—1 for x, applying this binary sequence to the transfer function given above. and 
adding a normally distributed noise terin with a standard deviation equivalent to a 
signal-to-noise ratio of 10 dB. The linear transversal equalizer s two decision regions 
are pictured in Figure 4.15. The region that is shaded with dots is the area for which 
the linear transversal equalizer produced an estimate of +1 for z, and the unshaded 
region where the equalizer produced an estimate of —1 for z,. Note that the best that 
the linear equalizer could do was to define two decision regions such that three of the 
four possible points fell within the proper region. The same 500 value data set was 


then used to train a ueural network having a 2.6 4 1 configuration. The decision 
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regions formed by the neural network after 100 iterations of the conjugate gradient 
algorithm are pictured in Figure 4.19. The neural network, because of its ability 
to account for the nonlinearities, was able to form two separate decision regions for 
each of the two possible values for z;. The four decision regions properly encompass 
the eight possible points associated with y; and y;-1. As a result, the total number 
of errors produced over the 500 value training set dropped from 151 for the linear 
equalizer to 65 for the neural network. The neural network’s ability to form more 
complex decision regions allowed it to more accurately perform equalization when the 
transfer function was nonminiinum phase. 
c. A Nonminimum Phase Channel Equalizer Using a Delay 

It was stated earlier that introduction of a delay d could allow the 
linear equalizer to more accurately perform its equalization function. Pictured in 
Figure 4.20 are the eight possible points associated with y, aud y,_; for a delay of 
one sample (i.e., the estimate of 2,-; based on the samples y; and y,-,). The two 
classes of points are no longer intermixed as they were for the case of no delay. A set 
of coefficients for the linear equalizer can therefore be calculated that will properly 
separate the two sets of points. With noise added, however, the sets of points begin to 
intermix as shown in Figure 4.21 for a signal-to-noise ratio of 10 dB. The separation 
of the two classes becomes more difficult particularly for the linear equalizer which 
can only use a single line to define the decision boundary. The coefficients for the 
linear equalizer were again calculated using the RLS algorithm and the 500 values 
for y; pictured in Figure 4.21. The resulting decision regions are shown in Figure 
4.22. Comparison of the two decision regions with the original training data (Figure 
4.21) indicates that the linear equalizer was unable to define a single line that could 
separate all tlie points into their proper regions. The linear equalizer produced a 


total of 19 errors over the 500 values of the training data set. The same training data 
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Figure 4.17: Possible combinations of y, and y,_, with noise added 
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Figure 4.19: Neural network decision regions 
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set was then used to train a neural network with a 2-6-4-1 configuration using the 
conjugate gradient algorithm. After twenty iterations, the neural network produced 
the two decision regions pictured in Figure 4.23. The boundary between the two 
decision regions is no longer a straight line but is shaped to take into account the 
distribution of points caused by the introduction of noise. The neural network only 
produced a total of 3 errors over the 500 value training set. 
d. A Performance Comparison 

The results from the two above examples would tend to indicate that 
a neural network can produce superior results to the linear equalizer both when 
a delay is introduced and when a delay is not introduced. In order to confirm this 
result. the performance of both the linear transversal equalizer aud the neural network 
were evaluated for various signal-to-noise ratios. lhe measure of performance for 
the test was the average bit error probability. The four signal-to-noise ratios: 5.0 
dB. 10 dB. 20 dB. and 25 dB were used to generate four different sets of training 
sequences. Each sequence was generated using a different signal-to-noise ratio. Both 
the linear equalizer and the neural network were then trained using these four 500 
value sequences for y,. After calculating the coefficients for the linear equalizer and 
the weights aud thresholds for the neural network the bit erro: performance of each 
type equalizer was calculated by passing the same 100.000 bit sequence through each 
equalizer and counting the number of times the equalizer produced an error. The 
results for the case where no delay was used is shown in Figure 4.24. As was expected. 
the bit error probability tor the linear equalizer with no delay was extremely poor. The 
bit error probability for the neural network steadily dropped as the magnitude of the 
noise fell. The lowest of the three curves shown in Figure 4.24 reflects the performance 
of the neural network at the various signal to-noise ratios after having been trained 


using the 10 dB SNR training data set. Its performance i> equal to or better than the 
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Figure 4.21: Possible combinations of y; and y;_, with noise added (with 


delay) 
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Figure 4.22: Linear equalizer (with delay) decision regions 





Figure 4.23: Neural network (with delay) decision regions 
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neural networks trained and evaluated for a specific SNR. This is because the lower 
SNR forced the conjugate gradient algorithm to produce a set of decision boundaries 
that were more nearly optimal. This result was even more apparent for the case when 
a delay was introduced in the equalization problem (Figure 4.25). The same method 
was used as described above, except that both the linear equalizer and neural network 
produced an estimate of z,_,, rather than x;, based on the received signals y, and 
y,-1 - Once again the neural network performed better than the linear equalizer and 
the neural network trained using 10 dB data performed the best. 

One final comparison can be made between the neural network and 
the linear transversal equalizer. This is a comparison of neural network without delay 
versus the linear equalizer with delay. This comparison is shown in Figure 4.26. Also 
shown is the neural network's performance with a delay. [he neural network without 
delay did not perform as well as the linear equalizer for low signal to-noise ratios. 
As the magnitude of the noise was reduced, however, the perforniance of the two 
approaches became comparable. The neural network with delay. however. was better 
than any of the approaches. 

e. Channel Equalizer Conclusions 

The performance of both a linear transversal equalizer and a neural 
network were evaluated with respect to their ability to accurately equalize a nonmin- 
imum phase digital data channel. It was found that a linear transversal equalizer was 
unable to accurately estimate the original signal because of the channel's nonmini- 
mum phase characteristic. The neural network, because of its ability to form arbitrary 
boundaries, did not suffer from this problem. Introduction of a delay, allowed both 
the linear transversal equalizer and the neural network to improve their performance. 
Finally, a neural network using no delay showed a comparable performance to a linear 


transversal filter with a delay for high signal-to-noise ratios. The ability of the neural 
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Linear Eq: 


Bit error probability (logo) 





5 10 15 20 20 
SNR (dB) 


Figure 4.24: Equalizer performance (no delay) 


Linear Eq: 
Neural Eq: – ~ - - 
Neural Eq(104/7): 


Bit error probability (logi) 





с 10 5 20 25 
SNR (dB) 


Figure 4.25: Equalizer performance (with delay)’ 
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network to perform equalization without introduction of a delay could prove useful, 


particularly if the required delay is unknown or varies with time. 


Linear Eq(w/delay): 


Bit error probability (logi) 





5 10 : 15 20 ао 
SNR (dB) 


Figure 4.26: Equalizer performance - all methods 


V. CONCLUSIONS AND 
RECOMMENDATIONS 


A. CONCLUSIONS 

The first objective of this thesis research was to develop an alternative to the 
backpropagation method for calculating the optimal set of weights and thresholds for a 
neural network. The results presented in Chapter IV demonstrated that the conjugate 
gradient algorithm developed for this thesis was more computationally efficient than 
the well known backpropagation method. 

The second objective of this research was to develop a better understanding of 
the relationship between the structure of a neural network and its ability to perform 
input-to-output mapping. A graphical approach was used to analyze the internal 
representations of the neural network. The results of this analvsis were presented in 
Chapter II. 

The final objective of this thesis research was to evaluate the performance of a 
neural network for several different signal processing applications. The first example 
presented demonstrated the ability of a neural network to perform classification. The 
second example. nonlinear time series prediction, conipared the performance of a 
neural network to its linear equivalent, and showed that the neural network produced 
superior results. The final example illustrated the performance differences between a 
neural network and a linear approach to nonminimum phase channel equalization. 

These applications demonstrated that the nonlinear properties of a neural net- 
work frequently allow the neural network to perform functions more effectively than 
its linear counterpart. This is particularly the true when the problein itself is nonlin- 


ear. It must be recognized. however. that there is a cost to this increased functionality. 


ТІ 


Calculation of the proper weights and thresholds for a given problem is much more 
computationally complex. The computational complexity associated with the use of 
a neural network must therefore be balanced with the accuracy desired when decid- 
ing whether to use a neural network rather than a linear approach to solve a given 


problem. 


B. FUTURE RESEARCH 

In the course of this thesis research, several other areas were identified that 
merit additional study. 

1. Transfer Function Selection 

The sigmoid function used for this research produced an output that ranged 

between 0 and 1. Other transfer functions could be investigated that produce a bipolar 
output. This could prove to be niore useful for typical signal processing applications. 
One such transfer function that could be evaluated is the hyperbolic tangent function 


pod ү = e72 1 
Сале = "nue = Em (ӘСІП 


This nonlinear function produces a value which ranges between +] and is continuously 
differentiable for all values of =. 
2. Neural Network Dynamic Range 

The performance of a neural network having a greater dynamic range could 
be investigated. The dynamic range of the neural network could be expanded by 
allowing adaptation of the output weight ws. It could also be accomplished by using 
a linear transfer function for the single neuron in output laver of the network. Thie 
output of the network would then be a linear combination of the weighted outputs 


from the second laver of network. This is approach taken by Lapedes and Farber in 


their research [Ref. 11]. 


=] 
to 


3. Internal Representations 
This thesis made no attempt to analvze the internal representations used 
by the neural network to produce the desired outputs for a given set of inputs. Re- 
search could be conducted to try to determine exactly what type of functions the 
individual neurons in the network perform. This could provide further insight into 
the relationship between the structure of a neural network and its ability to perform 
a particular task. 
4. Analysis of the Weights and Thresholds 
Research could be performed to determine if there is any analytical signif- 


icance to the final weight and threshold values for a neural network. 


APPENDIX A: PROGRAM OUTPUT SCREEN 
AND DATA FILES 


A. EXAMPLE OUTPUT SCREEN 


жж Conjugate Gradient Algorithm ** 


What is the name of the training data file? circ.dat 
How many inputs to the neural network? 2 

How many 1st layer neurons? 4 

How many 2nd layer neurons? 2 

There will be only one 3rd layer neuron. 

How many passes thru the training data set? 2 


Initial Error sum: 40.2786 


Performing iteration number 1 
Beta value: O 

Alpha value: 0.10958 

Error sum: 17.3579 


Performing iteration number 2 
Beta value: 0.00366825 

Alpha value: 3.79148 

Error sum: 17.3564 


Final error sum: 17.3557 


Where do you want the results stored? circ.res 
жж Calculating final results ** 


Where do you want the final weight/theta values stored? circ.wgt 
** Storing final weight/theta values жж 


Where do you want the map matrix stored? circ.map 
** Calculating map of network ** 


B. EXAMPLE INPUT DATA FILE 


.0000е-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 


O OOOO O л (л (л (л (л On CO (л O1 Qi Qi 5 bP BP bh hh Bb 


А 
Input 1 


.0000е-001 
.0000e-001 
.0000e-001 
.0000е-001 
.0000е-001 
.0000е-001 
.0000е%000 
.0000е%000 
.0000е-001 
.0000е-001 
.0000е-001 
.0000е-001 
.0000е-001 
.0000е-001 
.0000е-001 
.0000e-001 
.0000e-001 
.0000e*000 
.0000e*000 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 
.0000e-001 


OP WNP OF о 0-410045 À NO mm © + DO D «YY O O 


u 
Input 2 


.0000е%000 
.0000е%000 
.0000e+000 
.0000е%000 
.0000е%000 
.0000е%000 
.0000e*000 
.0000e*000 
.0000е%000 
.0000е%000 
.0000е%000 
.0000e+000 
.0000е%000 
.0000е%000 
.0000е%000 
.0000e+000 
.0000e*000 
.0000e*000 
.0000e*000 
.0000e*000 
.0000e+000 
.0000е%000 
.0000е%000 
.0000e*000 


= кюк кю © 0 Оо © О © кҥҥ кю кю юм со оо О о or Freer 


о а 
Desired output 


C. EXAMPLE RESULTS OUTPUT DATA FILE 


PrPrereogooqogoedcdddrrerereerRPooqoqgnedcoaoaodvrrrer.: 


.000000e+000 
.000000e+000 
.000000e+000 
.000000е+000 
.000000e+000 
.000000e+000 
.000000е+000 
.000000е+000 
.000000e+000 
.000000e+000 
.000000е+000 
.000000е%000 
.000000е+000 
.000000е+000 
.000000е+000 
.000000e+000 
.000000e+000 
.000000e+000 
.000000е+000 
.000000e+000 
.000000e+000 
.000000е%000 
.000000e+000 
.000000e+000 


—À—M 


Desired output 


атса gn EE MH gm هم هم عم‎ 


.133739е-001 
.738492е-001 
.743229e-001 
.747937е-001 
.752599e-001 
.757203е-001 
.761736e-001 
.714368e-001 
.718988e-001 
. 723659e-001 
.128365e-001 
.733092e-001 
.737825e-001 
.742548e-001 
.7147247e-001 
.751906е-001 
.756512е-001 
.761052е-001 
.113874e-001 
.718452е-001 
.723086е-001 
.727760е-001 
.732460е-001 
.737171е-001 


а —— 


Actual output 
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D. EXAMPLE FINAL WEIGHTS OUTPUT DATA FILE 


2 


. 7 
‚092301 
‚061310 


. 796225 
‚618687 
‚513663 
„831033 


SI ONES 


.262880 
‚542745 


‚344298 
. 000000 


4 2 } Number of neurons in each layer 


0.621861 
-0.595949 
-0.031015 
-0.522201 
201035871 

0.325973 

0.906917 

0.139071 


-0.194039 0.288292 
0.007433 -0.795105 


0.059272 0.010853 ¢ Input thresholds [00] 


Input weights [wo] 


lst layer weights [w)] 


lst layer thresholds [01 | 


2nd layer weights [w2] 


2nd layer threshold [#2] 


Output weight [ws] 


APPENDIX B: PROGRAM SOURCE CODE 
LISTING 


#Hinclude <stdio.h> 

#include <stdlib.h> 

#include <math.h> 

#include <float.h> 

KK KKK Kk OF KK KF OK E OK OF OF o o o O O E KF OF o o o KF FF ok ok ok e e FF OF KF o KK OF FF FF KF FF oF of ok oF FF oF J 


/* This program calculates the weights and thresholds for a */ 
/* feedforward multilayer neural network using the conjugate + / 
/* gradient optimization method. + | 


JK kk kk kk o OK KOK KK ok o 2K ok ok o o KK KK KK ok ok ok aA A жж ak k k ak k k oi / 
KKK KF KK KK KOK KF oF KK Ko ok ok ok ok ok ok ok o ok FC ok o ok ok o) KK ok ok ok ok ok o o ok o ok o ok ok ok ok KK ok ok ok ok ok o o ok ok ok ok ok / 
/ж FUNCTION DECLARATIONS */ 
KF KKK KKK OK KK KOK OK OE o OO OO KK kK KK KK KK KF o KK KF o Kk oF oF oF Kok Kok 
int get info(char filename[],int num_node[]); 
int get.data(char filename[],double ts .data[],int num inputs); 
int init_weights(double *weight_ptr,int num_node[]); 
int init_thetas(double *theta_ptr,int num_node[]); 
void adapt.network(double weight[],double theta[],int num.node[], 
int num.weights,int num theta,double data array[], 
int array.size,int max iteration); 
double fire neurons(double *activity.ptr, double *weight ptr, 
double *theta ptr,int num node[]); 
void calc.gradient(double activity[],double weight[], 
double theta_gradient[] ,double gradient[], 
int num_node[],int num_weights,int num_theta); 
double calc_beta(double old_gradient[],double old_theta_gradient[], 
double new_gradient[],double new_theta_gradient[], 
int num inputs,int num.theta); 
void update direction(double gradient[],double direction[],double beta, 
int num.intputs); 
void update weights(double weight[],double alpha,double direction[], 
int num_inputs); 
double calc_alpha(double weight[],double direction[] ,double theta[], 
double theta direction[],double activity[], 
double data .array[],int array.size,int num.node[], 
int num .weights,int num theta); 
void load .values(double *input_ptr,double *output ptr,int total.num); 
int fibon(int n); 


al 
Су) 


void write_result(double weight[],double theta[],int num_node[], 
double ts_data[],int set_size); 

void map.network(double weight[),double theta[],int num.node[]); 

void store weights(double weight[] ,double *theta_ptr,int num.node[]); 


f| Gok ak / 
he MAIN PROGRAM */ 
J E A A ak a ak A a a A K a k A ak a ak a ak ak ak ak ak ak ak ak a ak ak ak ak ak ak ak ak ak ak ak aK ak ak ak ak aK ak ak ak ak ak ak ak ak ak ak ak ak ak ak ak ak ak ak k k k k k 
main() 
{ 

char filename[i14]; 

int max iteration,num.node[5),num weights,set size,num theta; 

double ts_data[3000] ,weight [400] ,theta[50] ; 


printf("\n ** Conjugate Gradient Algorithm жж Ап”); 

max_iteration = get_info(filename,num_node) ; 

set_size = get_data(filename,ts_data,num_node[0]); 

if (set_size == o){ 
exit(0); 

d 

num.weights-init weights(weight num, node); 

num, theta-init,thetas(theta,num node); 

adapt network(weight,theta,num node,num weights ,num theta, 

ts data,set size,max iteration); 

write result(weight,theta,num.node,ts data,set. size); 

store.weights(weight,theta,num. node); 

if (num_node[0] == 2){ 
Map_network (weight, theta, num_node) ; 


Е 
exit(0); 
) 
Г жжжжжжжжжжж жж жж жжжжж жж ж жжжжжжжжжж Жжжж Жжжж жж ЖЖ ЖЖЖЖ Жжж 
/ж FUNCTION GET_INFO ж/ 


Гжжжжжжжжжжжжжжжжжжжжжжжж жж жж кж кж жж жж жжке жжжкжжж жжке 
int get_info(char filename[],int num_node[]) 


{ 


int max_iteration; 


printf("\n What is the name of the training data file? "); 
flushall(); 

gets(filename) ; 

printf("\n How many inputs to the neural network? "); 
scanf ("%2hd" ,&num_node [0] ) ; 

printf("\n How many 1st layer neurons? "); 

scanf ("%2hd" ,gnum_node[1]); 
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printf("\n How many 2nd layer neurons? "); 

scanf("%2hd" ,&num_node[2]) ; 

printf("\n There will be only one 3rd layer neuron. "); 
num_node[3] = 1; 

num_node[4] = 1; 

printf("\n\n How many passes thru the training data set? "); 
scanf ("/4hd" ,£max iteration); 

return(max iteration); 


/хжхжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжҡжЖжжжжкжжҡжжжжҡжЖжжжжжжжж жжжжжжжжж жжж/ 
/ж FUNCTION GET_DATA ж/ 


уххжхжжкжжжжжжжжжжжхжжжжжжжжж жж жжжЖж жж жж жж жжХжхжжжжжек жжжжхжжжжхжжжж/ 
int рес дата(сһаг filename[] ,double ts_data[],int num inputs) 

FILE *stream; 

int i,num_read; 


num_read = 0; 
if ((stream = fopen(£filename[O],"r")) != NULL){ 
for (i=0;(i < 3000)&& 
(fscanf(stream,"{lg",ts_data + i)>0);i++) 
fclose(stream) ; 
if ((i%(num_inputs+1)) != 0){ 
printf("\n\n ** Improper number of input data elements **"); 
num_read = 0; 
} 
else{ 
num_read = i/(num_inputs+1); 
} 
} 
е1ѕе{ 
printf("\n\n ** Could not find the specified file **"); 
} 


return(num_read) ; 


} 
RRR III IO IOI I IG IOI ook ok kok | 
/ж FUNCTION INIT_WEIGHTS */ 


[RO ORR ORCI IO IOI a I AICI IOI ak ak ak ak ak ak oF ok ok ok of ok ok J 
int init_weights(double *weight_ptr,int num_node[]) 
{ 

#define MAX_VAL 16384.0 

int num weights,i; 


srand(1); 


SO 


hum.weights = 0; 

for (i=0;i<3;i++){ 
num.weights += num.node[i]*num node[i*1]; 

J 

for (i=0;i<(num_node[0]*num_node[1]) ;i++){ 
*weight ptr** = (1.0 - (rand()/MAX_VAL)); 

} 

for (i=0;i<(num_node[1]*num_node[2]) ;i++){ 
*weight _ptr++ = (1.0 - (rand()/MAX_VAL)) ; 

} 

for (i=0;i<(num_node [2] *num_node[3] ) ;i++){ 
*weight ptr** = (1.0 - (rand()/MAX_VAL)); 

J; 

*weight_ptr = 1.0; 

hum_weights += 1; 

return(num_weights) ; 


f xx kk ko c ok KOK OKO OOK OK OF OK OK OK OF OK KOK Kk OE OC O OK OK E O OF OC E E KOE OF OK SK OK OO OK OK OC Ok E OK OC O ORO OK OK Ok OC OF OF ok ok oF ok ok | 
/* FUNCTION INIT_THETAS x/ 


EFE KOKO OOK KOK DK OKO OK OK O OK OK O OK OKO OK жж жж жж жж жж CO Ka / 


int init_thetas(double *theta_ptr,int num_node[]) 


int num_theta,i; 

num_theta = num_node[i]+num_node[2]+num_node[3] ; 

for (i=0;i<num_theta;i++){ 

*xtheta_ptr++ = 0.0; 

} 

return(num_theta) ; 
} 
[ORO IR OR ook oe eoe x f 
/* FUNCTION ADAPT_NETWORK + / 


[OIG GC GG Fo oF oF oF FF Û 
void adapt_network(double weight[],double theta[],int num_nodel], 

int num_weights,int num_theta,double data_array[], 

int array_size,int max_iteration) 


int iteration,i,j,set_num; 

double activity[50],gradient[400] ,direction[400] , gradient sum[400] ; 
double actual.output,desired output,alpha,beta,old gradient mag; 
double theta.gradient[50],theta sum[50],theta.direction[50]; 

double old.gradient sum[50],old theta. sum[50] ,error,errorsum; 
double *array ptr; 


for (iteration=0;iteration<max_iteration;iteration++) { 
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for (i=0;i<num_weights;i++){ 
gradient_sum[i] = 0.0; 
} 
for (i=0;i<num_theta;i++){ 
theta_sum[i] = 0.0; 
jj 
errorsum=0.0; 
array_ptr = data_array; 
for (set_num=0;set_num<array_size;set_num++){ 
for (i=0;i<num_node[O] ;i++){ 
activity[i] = *array_ptr++; 
Т 
desired_output = *array_ptr++; 
actual_output=fire_neurons(activity ,weight,theta,num_node) ; 
error = actual_output - desired_output; 
error *= error; 
gradient [num_weights-1] = (actual_output - desired output)* 
actual_output ; 
calc.gradient(activity,weight,theta gradient,gradient,num node, 
num. weights,num. theta); 
for (i=0;i<(num_weights-1) ;i++){ 
gradient_sum[i] += gradient[i]; 
ү 
for (i=0;i<num_theta;i++){ 
theta_sum[i] += theta gradient[i]; 
1; 


errorsum += error; 


} 
printf(" Error sum: %lg \n",errorsum); 
if (iteration == 0){ 
beta = 0.0; 
i 
else{ 
beta = calc_beta(old_gradient_sum,old_theta_sum,gradient_sum, 
theta sum, (num weights-1),num theta); 
B 


for (j=0;j<(num_weights-1);j++){ 
old_gradient_sum[j] = gradient.sum[j]; 
} 
for (j=0;j}<num_theta; j++){ 
old_theta_sum[j] = theta_sum[j]; 
} 
printf("\n Performing iteration number Ad \n", (iteration+1)); 
printf(" Beta value: #1g \n",beta); 
update. direction(gradient sum,direction,beta, (num weights-1)); 


Си 
Wo 


update_direction(theta_sum,theta_direction,beta,num_theta) ; 
alpha=calc_alpha(weight ,direction,theta,theta_direction,activity, 
data_array,array_size,num_node,num_weights, 
num_theta) ; 
printf(" Alpha value: %lg \n",alpha) ; 
update_weights(weight,alpha,direction, (num_weights-1)); 
update_weights(theta,alpha,theta_direction,num_theta) ; 
} 
errorsum = 0.0; 
array_ptr = data_array; 
for (set_num=0;set_num<array_size;set_num++) { 
for (i=0;i<num_node [0] ;i++){ 
activity[i] » *array.ptr**; 
} 
desired_output = *array_ptr++; 
actual output-fire neurons(activity,weight,theta,num node); 
error - actual output - desired output; 
error *- error; 
errorsum *- error; 


) 

printf("Mn Final error sum: /lg \n",errorsum) ; 

return; 
} 
[ORI III OI OOO oii ok xo x f 
/* FUNCTION FIRE_NEURONS */ 


[et / 
double fire_neurons(double *activity_ptr,double *weight_ptr, 

double *theta ptr,int num.node[]) 
1 

int layer .num,neuron, num,j; 

double temp,*input ptr,*output.ptr; 


input.ptr с астіуіту рег; 
output_ptr = activity_ptr + num_node[0] ; 


/* Feed input forward thru each layer of the network */ 
for (layer_num=0; layer_num<3;layer_num++) { 

for (neuron_num=O;neuron_num < num_node[layer_num+1] ;neuron_num++) { 
temp = 0.0; 
for (j=0;j < num_node[layer_nun] ; j++) { 

temp -= (*weight_ptr++)*(input_ptr([j]); 

j 
temp *- *theta ptr**; 
*output_ptr++ - 1.0/(1.0*exp(temp)); 

) 


СУ. 
LE 


input.ptr *- num. node[layer num); 


} 

temp = (*input_ptr) * (*weight ptr); 

return(temp); 
0 
[OOOO OOO OOOO ж жож ККЖ ЖЖК ЖКЖ Жжжж ЖЖЖ / 
/* FUNCTION CALC_GRADIENT */ 


[DOO GO GO OO GG OGIO a Iota / 
void calc_gradient (double activity[],double weight[], 

double theta_gradient[],double gradient[], 

int num.node[],int num weights,int num. theta) 


int layer num,i,j,offset; 
double *weight ptr,*gradient.ptr,*result.gradient ptr; 
double *output acty.ptr,*input.acty ptr,temp,*theta ptr; 


weight ptr - £weight[num weights-1]; 

gradient ptr - &gradient[num weights-1]; 

result.gradient.ptr - gradient.ptr - 1; 

output acty ptr - £activity[O] * (num .node[O]*num.node[1]*num.node[2]) ; 
input_acty_ptr = output_acty_ptr - 1; 

theta_ptr = &theta_gradient[num_theta-1]; 


for (layer_num = 2;layer_num>-1;layer_num--){ 
for (j=0;j<num_node[layer_num + 1];j++){ 
temp = 0.0; 
offset = 0; 
for (i=0;i<num_node[layer_num+2] ;i++){ 
temp += (*weight_ptr) * (*gradient_ptr) ; 
weight_ptr -= num_node[layer_num+1]; 
gradient_ptr -= num_node[layer_num+1] ; 
2 
offset = (num_node[layer_num+2] *num_node[layer_num+1])-1; 
weight_ptr += offset; 
gradient_ptr += offset; 
temp *= (1.0 - (*output_acty_ptr--)); 
for (i=0;i<num_node[layer_num] ;i++){ 
(*result_gradient_ptr--) = temp * (*input.acty.ptr--); 


} 
*theta ptr-- = (-temp); 
input .acty ptr *- num. node[layer num]; 
} 
input.acty ptr -- num_node[layer_nun] ; 
В 
return; 


оо 
pes 


} 


f| ooo ORO ok ok ok k k k k k k k kk k kk k / 


/* FUNCTION UPDATE_WEIGHTS «/ 


ужезжажакжж кк жж ok e oF ok ok ak kok ik ok ok ok ok ok ok ok ok of of ok ok ok ok ok oak a ok ok okt ok ok ok ok o ok ok ik sk ok ok ok ik oF ok sk ok o ok ok oF / 
void update_weights(double weight[),double alpha, double direction[], 
int num, inputs) 


inei: 


for (i=0;i<num_inputs;i++){ 
weight[i) += alpha*direction[i]; 


} 

return; 
[RO oo eoo ok oko ok E ok ok kk sk ok o sk o ok o ik sk e o kk ok o ok ok o ok ok ok ok a sk ok ok ik i FE Fe E sk oF ok of ok ok ok ok | 
/* FUNCTION CALC_BETA «/ 


f| ooo o ooo ok X i o Ro ok C o X ok x 0 e oko же ж жж ORO KORG GRO ROG GO GG ex f 
double calc .beta(double old gradient[],double old. theta gradient[], 
double nevw.gradient[],double new theta. gradient[], 

int num .inputs,int num.theta) 
1 

üt. 1; 

double beta,temp1,temp2; 


tempi 0.0; 

temp2 = 0.0; 

for (i=0;i<num_inputs;i++){ 
tempi += ((new_gradient[i]-old_gradient [i] )*new_gradient[i]); 
temp2 += (old gradient[i] * old.gradient[i]); 


} 
for (i=0;i<num_theta;i++){ 
tempi += ((new_theta_gradient[i]-old_theta_gradient[i])* 
new_theta_gradient[i]); 
temp2 += (old_theta_gradient{i] + old_theta_gradient{il); 
1 
beta = temp1/temp2; 
if (beta < 0.0){ 


beta = 0.0; 
} 
return(beta); 
} 
J| 0h oko ck o o ook RC lo ko 0 C C OC CC E Dk Kk Ok kk ke ik lc X c X Xo koc de oko oko Xo ike oie ok ok ok sk ok ok i o ok ok ik ok ok oF x / 
/* FUNCTION UPDATE DIRECTION */ 


JEK KEE KK KOE E KEK KK E KE E Dk OK I E DK kk ik sk Fk uk E E ik c on ooo ok eo OK / 


void update direction(double gradient[], double direction[], 
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double beta, int num_inputs) 


Т 
int i; 
for (i=0;i<num_inputs;i++){ 
direction[i] *= beta; 
direction[i] -= gradient[i]; 
} 
return; 
i 
/ жжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжж 
/ж FUNCTION CALC_ALPHA */ 


/жжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжужжжжжжжжжжжжжжжжжжжжжжжж / 
double calc_alpha(double weight[],double direction[] ,double theta[], 
double theta.direction[],double activity[], 
double data_array[],int array_size,int num_node[], 
int num_weights,int num_theta) 


double a,b,lamda,mu,lamda_result ,mu_result,desired_result,epsilon; 
double actual.result,test weight[500],test theta[50] ,*array ptr; 
int i,k,set_num,max_steps; 


a = 0.0; 

b = 10.0; 

max_steps = 16; 

epsilon = 0.001; 

lamda = a+((b-a)*fibon(max_steps-2) /fibon(max_steps)) ; 
mu = a+((b-a)*fibon(max_steps-1)/fibon(max_steps) ) ; 


a -= lamda; 
Ы -- lamda; 
mu -= lamda; 
lamda = 0.0; 


load_values(weight,test_weight,num_weights) ; 
load_values(theta,test_theta,num_theta); 
update_weights(test_weight,lamda,direction, (num_weights-1)); 
update_weights(test_theta, lamda, theta_direction,num_theta) ; 
lamda_result = 0.0; 
array_ptr = data_array; 
for (set_num=0;set_num<array_size;set_num++) { 

for (i1=0;i<num_node [0] :i++){ 

activity[i] 9» *array.ptr**; 

dr 

desired_result = *array ptr**; 

actual result-fire neurons(activity,test weight,test theta, 

num. node); 


actual_result -= desired result; 
actual „result *= actual_result; 
lamda_result += actual_result; 
} 
load values(weight,test weight,num weights); 
load values(theta,test theta,num theta); 
update.weights(test weight,mu,direction,(num.weights-1)); 
update. weights(test theta,mu,theta direction,num, theta); 
mu result = 0.0; 
array.ptr 7 data array; 
for (set_num=0;set_num<array_size;set_num++){ 
for (i=0;i<num_node[0] :i++){ 
activity[i] = *array_ptr++; 
} 
desired.result = *array_ptr+tt; 
actual_result=fire_neurons(activity,test_weight,test_theta, 
num_node) ; 
actual_result -= desired_result; 
actual_result *= actual_result; 
mu_result += actual_result; 
y 
for (k=1;(k<(max_steps-1))&&(b>0.0) ; k++) { 
if (lamda_result > mu_result){ 
a = lamda; 
lamda = mu; 
lamda_result = mu_result; 
mu = ((b-a)/fibon(max_steps-k)) ; 
mu *= fibon(max_steps-k-1) ; 
mu += a; 
load_values(weight,test_weight ,num_weights) ; 
load_values(theta,test_theta,num_theta) ; 
update weights(test weight,mu,direction, (num weights-1)); 
update weights(test theta,mu,theta direction,num. theta); 
mu.result - 0.0; 
array.ptr - data array; 
for (set_num=0;set_num<array_size;set_num++){ 
for (i=0;i<num_node[0] ;i++){ 
activity[i] = *array_ptr++; 
) 
desired result -» *array.ptr**; 
actual.resultzfire neurons(activity,test weight,test theta, 
num. node); 
actual result --» desired result; 
actual „result += actual _result; 
mu_result += actual result; 


NT 


iy 
E 
else{ 
b = mu; 
mu = lamda; 
mu_result = lamda_result; 
lamda = ((b-a)/fibon(max_steps-k)) ; 
lamda *= fibon(max_steps-k-2) ; 
lamda += a; 
load.values(weight,test weight,num weights); 
load.values(theta,test theta,num. theta); 
update_weights(test_weight,lamda,direction, (num_weights-1)) ; 
update.weights(test theta,lamda,theta direction,num theta); 
lamda_result = 0.0; 
array.ptr = data_array; 
for (set_num=0;set_num<array_size;set_num++){ 
for (i=0;i<num_node[0] ;i++){ 
activity[i] = *array_ptr++; 
} 
desired_result = *array_ptr++; 
actual.result-fire neurons(activity,test weight,test theta, 
num, node); 
actual _ result -= desired_result; 
actual result *- actual,.result; 
lamda result += actual result; 
b 
В 
j 
if (b>0.0){ 
mu = lamda + epsilon; 
load_values(weight ,test_weight ,num_weights) ; 
load_values(theta,test_theta,num_theta) ; 
update weights(test weight,mu,direction, (num weights-1)); 
update weights(test theta,mu,theta.direction,num theta); 
mu_result = 0.0; 
array_ptr = data_array; 
for (set_num=0;set_num<array_size;set_numt+t+){ 
for (i=0;i<num_node([0] ;i++){ 
activity[i] = *array.ptr**; 
} 
desired_result = *аггау_рїг++; 
actual_result=fire_neurons(activity,test_weight ,test_theta, 
num_node) ; 
actual_result -= desired_result; 
actual_result *= actual_result; 


( 


pa 


mu_result += actual_result; 
} 
if (lamda_result > mu_result){ 
if ((lamda+b)> 0.0){ 
return((lamda*b)/2.0); 
jr 
else{ 
return(0.0); 
} 
} 
else{ 
if ((lamda+a)> 0.0){ 
return((lamdata)/2.0); 
} 
else{ 
return(0.0); 
} 
} 
} 
else{ 
return(0.0); 
} 
} 


J E E E k k E a a a EOE Ek ok ak ok ok ok ak ok ok ok ak ok ak E ak ok ok ok k ak ak k k aK ak ak ke ak ok ak k ik Fok K kok of oki ok O Fk ok O ok 2 oak ok ok ok oF ok E | 


As FUNCTION LOAD. VALUES */ 


f| ROC c OK OOOO OO K E XO k kE k KE E OK ok ok ok k E f 
void load_values(double *input_ptr,double *output_ptr,int total_num) 


їп: 


for (i=0;i<total_num;i++){ 
*output_ptr++ = *input_ptr++; 


return; 
KKK OOO OK OOK OE OE OK kO OO O OKO OK O KK O E Ok OC OK Ok OK O O OK E / 
/ж FUNCTION FIBON * / 


/жхжжжжжжжжжжжжжжяж»жжжжжжж»жжжжжж*ж*жжжжжЖЖЖЖжжЖ*жжж*жжжжжжжж*жжж*жЖжжжж/ 
int fibon(int n) 


í 
nite TO, £1 , foek:: 


f2=f1=f0=1; 


Jf nae 2) 
return(1); 
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} 
for (k=1;k<n;k++){ 
fO = fi + f2; 


f2 = fi; 
fi = ҒО; 
р 
return (f0) ; 
} 
[RRO IO OK OK OO ICICI CIO II ICO ak ak OI IC ak ak I CIO OO a rok a k ak k k k / 
/ж FUNCTION WRITE_RESULT ж/ 


Jik KK KK KKK KK KOK KOK OK OK KOK KK KK KK KK KK OK KO OF O OK O O O KK FC O KOK OK KK O CF ok FF Fk kk KF | 


void write result(double weight[],double theta[],int num_nodel], 
double ts data[],int set. size) 
{ 
FILE *fileptr; 
char fname[14]; 
int i,set num; 
double desired_result,result,activity [50] ,*array_ptr; 


printf('"\n\n Where do you want the results stored? "); 
flushall(); 
gets(&fname[0]) ; 
printf("\n ** Calculating final results ** \n"); 
fileptr = fopen(&fname[0] ,"w"); 
array_ptr = ts_data; 
for (set num-0;set num4set _size;set_num++){ 
for (i=0;i<num_node[0];i++){ 
activity[i] = *array_ptr++; 
} 
desired_result = *array_ptr+t; 
result = fire_neurons(activity,weight,theta,num_node), 
fprintf(fileptr," Же Ze \n'',desired_result,result); 
у 
fclose(fileptr) ; 
return; 


ү 


/жжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжжж жж жж жж жж Жжжж ж жж жж 
/ж FUNCTION MAP_NETWORK ж/ 
kk f sk ok ok kk ok ok ok ok ok ok oke ok ok o ok ok kc ok ok ki ok ok ok ok ok ok oki Kk Kk ki kk kk Kk ok ok kK kk FF f f kk Fk FF f ok FF Fk kk > | 
void map.network(double weight[],double theta[],int num.node[]) 
{ 

int row,col; 

double result,inputi,input2,activity[50]; 

FILE *fileptr; 

char fname[13]; 
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printf("\n\n Where do you want the map matrix stored? "); 
flushallO; 
gets(&fname[0]); 
printf("\n ** Calculating map of network **\n"); 
fileptr = fopen(&fname[0] ,"w"); 
input 1=input2=0.0; 
for (row=0;row<21;row++){ 
for (col=0;col<21;colt++){ 
астіуіту [0] =іприт1; 
activity[1]=input2; 
result-fire.neurons(activity weight,theta,num node); 
fprintf(fileptr," %e",result); 
inputi += 0.05; 
j 
tpp3ntf(fileptr,"in"): 
inputi = 0.0; 
input2 *- 0.05; 
J 
fclose(fileptr); 


return; 


Т 


KEE O O O O O ok ike kD O ok ok o f oke ok ok ike ok o oe of ok fe ak ak k K kk k ak k ak ak akok ak k ak k k of o ok ok ok o k k S 


/ж 


FUNCTION STORE_WEIGHTS * / 


f| Roo ko oko ok ko kc koc koc ok ok koc c Oc ok ke ok kc ok OOOO IOC OO I Io k k k k k k k k ж жж ж 


void store_weights(double weight[],double *theta ptr,int num.node[]) 


{ 


int i,j,k; 

double *weight_ptri,*weight_ptr2; 
char fname[13]; 

FILE *fileptr; 


printf("\n\n Where do you want the final weight/theta values stored? "); 
flushall(); 
gets(&fname[0]) ; 
printf("\n ** Storing final weight/theta values **\n"); 
fileptr = fopen(&fname[0] ,"w"); 
for (i=0;i<3;i++){ 
fprintf(fileptr,"%4d" ,num_node[i]); 
} 
fprimtt(fileptr,'" Xn"); 
weight_ptr2 = weight; 
Ғог (1=0;1<3;1++){ 
weight_ptri = weight_ptr2; 
for (j=0;j<num_node[i] ;j++)f 


9] 


weight_ptril = weight_ptr2 + j; 
for (k=0;k<num_node[i+1] ;k++){ 
fprintf(fileptr,"%10.61f ", *weight_ptr1); 
weight_ptri += num_node[i]; 
} 
fprintf(fileptr," \n"); 
} 
weight_ptr2 += (num_node[i]*num_node[i+1]); 
for (j=0;j<num_node[i+1];j++){ 
fprintf(fileptr,"%10.6lf ", *theta_ptr++); 
J 
fprintf (fileptr," \n"); 
Т 
fprintf(fileptr,"%10.61f \n",*weight_ptr2) ; 
fclose(fileptr) ; 
return; 


10. 
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